{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Theano, Lasagne\n",
    "and why they matter\n",
    "\n",
    "\n",
    "### got no lasagne?\n",
    "Install the __bleeding edge__ version from here: http://lasagne.readthedocs.org/en/latest/user/installation.html"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Warming up\n",
    "* Implement a function that computes the sum of squares of numbers from 0 to N\n",
    "* Use numpy or python\n",
    "* An array of numbers 0 to N - numpy.arange(N)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np",
    "\n",
    "\n",
    "\n",
    "def sum_squares(N):",
    "\n",
    "    return < student.Implement_me() >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "% % time",
    "\n",
    "sum_squares(10**8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# theano teaser\n",
    "\n",
    "Doing the very same thing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import theano",
    "\n",
    "import theano.tensor as T"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "\n",
    "# I gonna be function parameter",
    "\n",
    "N = T.scalar(\"a dimension\", dtype='int32')",
    "\n",
    "\n",
    "\n",
    "# i am a recipe on how to produce sum of squares of arange of N given N",
    "\n",
    "result = (T.arange(N)**2).sum()",
    "\n",
    "\n",
    "# Compiling the recipe of computing \"result\" given N",
    "\n",
    "sum_function = theano.function(inputs=[N], outputs=result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "% % time",
    "\n",
    "sum_function(10**8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# How does it work?\n",
    "* 1 You define inputs f your future function;\n",
    "* 2 You write a recipe for some transformation of inputs;\n",
    "* 3 You compile it;\n",
    "* You have just got a function!\n",
    "* The gobbledegooky version: you define a function as symbolic computation graph.\n",
    "\n",
    "\n",
    "* There are two main kinвs of entities: \"Inputs\" and \"Transformations\"\n",
    "* Both can be numbers, vectors, matrices, tensors, etc.\n",
    "* Both can be integers, floats of booleans (uint8) of various size.\n",
    "\n",
    "\n",
    "* An input is a placeholder for function parameters.\n",
    " * N from example above\n",
    "\n",
    "\n",
    "* Transformations are the recipes for computing something given inputs and transformation\n",
    " * (T.arange(N)^2).sum() are 3 sequential transformations of N\n",
    " * Doubles all functions of numpy vector syntax\n",
    " * You can almost always go with replacing \"np.function\" with \"T.function\" aka \"theano.tensor.function\"\n",
    "   * np.mean -> T.mean\n",
    "   * np.arange -> T.arange\n",
    "   * np.cumsum -> T.cumsum\n",
    "   * and so on.\n",
    "   * builtin operations also work that way\n",
    "   * np.arange(10).mean() -> T.arange(10).mean()\n",
    "   * Once upon a blue moon the functions have different names or locations (e.g. T.extra_ops)\n",
    "     * Ask us or google it\n",
    " \n",
    " \n",
    "Still confused? We gonna fix that."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Inputs",
    "\n",
    "example_input_integer = T.scalar(\"scalar input\", dtype='float32')",
    "\n",
    "\n",
    "# dtype = theano.config.floatX by default",
    "\n",
    "example_input_tensor = T.tensor4(\"four dimensional tensor input\")",
    "\n",
    "# не бойся, тензор нам не пригодится",
    "\n",
    "\n",
    "\n",
    "input_vector = T.vector(\"my vector\", dtype='int32')  # vector of integers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Transformations",
    "\n",
    "\n",
    "# transofrmation: elementwise multiplication",
    "\n",
    "double_the_vector = input_vector*2",
    "\n",
    "\n",
    "# elementwise cosine",
    "\n",
    "elementwise_cosine = T.cos(input_vector)",
    "\n",
    "\n",
    "# difference between squared vector and vector itself",
    "\n",
    "vector_squares = input_vector**2 - input_vector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Practice time:",
    "\n",
    "# create two vectors of size float32",
    "\n",
    "my_vector = student.init_float32_vector()",
    "\n",
    "my_vector2 = student.init_one_more_such_vector()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write a transformation(recipe):",
    "\n",
    "#(vec1)*(vec2) / (sin(vec1) +1)",
    "\n",
    "my_transformation = student.implementwhatwaswrittenabove()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(my_transformation)",
    "\n",
    "# it's okay it aint a number"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# What's inside the transformation",
    "\n",
    "theano.printing.debugprint(my_transformation)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Compiling\n",
    "* So far we were using \"symbolic\" variables and transformations\n",
    " * Defining the recipe for computation, but not computing anything\n",
    "* To use the recipe, one should compile it"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "inputs = [ < two vectors that my_transformation depends on > ]",
    "\n",
    "outputs = [ < What do we compute (can be a list of several transformation) > ]",
    "\n",
    "\n",
    "# The next lines compile a function that takes two vectors and computes your transformation",
    "\n",
    "my_function = theano.function(",
    "\n",
    "    inputs, outputs,",
    "\n",
    "    # automatic type casting for input parameters (e.g. float64 -> float32)",
    "\n",
    "    allow_input_downcast=True",
    "\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# using function with, lists:",
    "\n",
    "print \"using python lists:\"",
    "\n",
    "print my_function([1, 2, 3], [4, 5, 6])",
    "\n",
    "print",
    "\n",
    "\n",
    "# Or using numpy arrays:",
    "\n",
    "# btw, that 'float' dtype is casted to secong parameter dtype which is float32",
    "\n",
    "print \"using numpy arrays:\"",
    "\n",
    "print my_function(np.arange(10),",
    "\n",
    "                  np.linspace(5, 6, 10, dtype='float'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Debugging\n",
    "* Compilation can take a while for big functions\n",
    "* To avoid waiting, one can evaluate transformations without compiling\n",
    "* Without compilation, the code runs slower, so consider reducing input size\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# a dictionary of inputs",
    "\n",
    "my_function_inputs = {",
    "\n",
    "    my_vector: [1, 2, 3],",
    "\n",
    "    my_vector2: [4, 5, 6]",
    "\n",
    "}",
    "\n",
    "\n",
    "# evaluate my_transformation",
    "\n",
    "# has to match with compiled function output",
    "\n",
    "print my_transformation.eval(my_function_inputs)",
    "\n",
    "\n",
    "\n",
    "# can compute transformations on the fly",
    "\n",
    "print(\"add 2 vectors\", (my_vector + my_vector2).eval(my_function_inputs))",
    "\n",
    "\n",
    "#!WARNING! if your transformation only depends on some inputs,",
    "\n",
    "# do not provide the rest of them",
    "\n",
    "print(\"vector's shape:\", my_vector.shape.eval({",
    "\n",
    "    my_vector: [1, 2, 3]",
    "\n",
    "}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* When debugging, it's usually a good idea to reduce the scale of your computation. E.g. if you train on batches of 128 objects, debug on 2-3.\n",
    "* If it's imperative that you run a large batch of data, consider compiling with mode='debug' instead"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Your turn: Mean Squared Error (2 pts)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Quest #1 - implement a function that computes a mean squared error of two input vectors",
    "\n",
    "# Your function has to take 2 vectors and return a single number",
    "\n",
    "\n",
    "<student.define_inputs_and_transformations() >",
    "\n",
    "\n",
    "compute_mse = <student.compile_function() >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Tests",
    "\n",
    "from sklearn.metrics import mean_squared_error",
    "\n",
    "\n",
    "for n in [1, 5, 10, 10**3]:",
    "\n",
    "\n",
    "    elems = [np.arange(n), np.arange(n, 0, -1), np.zeros(n),",
    "\n",
    "             np.ones(n), np.random.random(n), np.random.randint(100, size=n)]",
    "\n",
    "\n",
    "    for el in elems:",
    "\n",
    "        for el_2 in elems:",
    "\n",
    "            true_mse = np.array(mean_squared_error(el, el_2))",
    "\n",
    "            my_mse = compute_mse(el, el_2)",
    "\n",
    "            if not np.allclose(true_mse, my_mse):",
    "\n",
    "                print('Wrong result:')",
    "\n",
    "                print('mse(%s,%s)' % (el, el_2))",
    "\n",
    "                print(\"should be: %f, but your function returned %f\" %",
    "\n",
    "                      (true_mse, my_mse))",
    "\n",
    "                raise ValueError(\"Что-то не так\")",
    "\n",
    "\n",
    "print(\"All tests passed\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Shared variables\n",
    "\n",
    "* The inputs and transformations only exist when function is called\n",
    "\n",
    "* Shared variables always stay in memory like global variables\n",
    " * Shared variables can be included into a symbolic graph\n",
    " * They can be set and evaluated using special methods\n",
    "   * but they can't change value arbitrarily during symbolic graph computation\n",
    "   * we'll cover that later;\n",
    " \n",
    " \n",
    "* Hint: such variables are a perfect place to store network parameters\n",
    " * e.g. weights or some metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# creating shared variable",
    "\n",
    "shared_vector_1 = theano.shared(np.ones(10, dtype='float64'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# evaluating shared variable (outside symbolicd graph)",
    "\n",
    "print(\"initial value\", shared_vector_1.get_value())",
    "\n",
    "\n",
    "# within symbolic graph you use them just as any other inout or transformation, not \"get value\" needed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# setting new value",
    "\n",
    "shared_vector_1.set_value(np.arange(5))",
    "\n",
    "\n",
    "# getting that new value",
    "\n",
    "print(\"new value\", shared_vector_1.get_value())",
    "\n",
    "\n",
    "# Note that the vector changed shape",
    "\n",
    "# This is entirely allowed... unless your graph is hard-wired to work with some fixed shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Your turn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Write a recipe (transformation) that computes an elementwise transformation of shared_vector and input_scalar",
    "\n",
    "# Compile as a function of input_scalar",
    "\n",
    "\n",
    "input_scalar = T.scalar('coefficient', dtype='float32')",
    "\n",
    "\n",
    "scalar_times_shared = <student.write_recipe() >",
    "\n",
    "\n",
    "\n",
    "shared_times_n = <student.compile_function() >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print \"shared:\", shared_vector_1.get_value()",
    "\n",
    "\n",
    "print \"shared_times_n(5)\", shared_times_n(5)",
    "\n",
    "\n",
    "print \"shared_times_n(-0.5)\", shared_times_n(-0.5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Changing value of vector 1 (output should change)",
    "\n",
    "shared_vector_1.set_value([-1, 0, 1])",
    "\n",
    "print \"shared:\", shared_vector_1.get_value()",
    "\n",
    "\n",
    "print \"shared_times_n(5)\", shared_times_n(5)",
    "\n",
    "\n",
    "print \"shared_times_n(-0.5)\", shared_times_n(-0.5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# T.grad - why theano matters\n",
    "* Theano can compute derivatives and gradients automatically\n",
    "* Derivatives are computed symbolically, not numerically\n",
    "\n",
    "Limitations:\n",
    "* You can only compute a gradient of a __scalar__ transformation over one or several scalar or vector (or tensor) transformations or inputs.\n",
    "* A transformation has to have float32 or float64 dtype throughout the whole computation graph\n",
    " * derivative over an integer has no mathematical sense\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_scalar = T.scalar(name='input', dtype='float64')",
    "\n",
    "\n",
    "scalar_squared = T.sum(my_scalar**2)",
    "\n",
    "\n",
    "# a derivative of v_squared by my_vector",
    "\n",
    "derivative = T.grad(scalar_squared, my_scalar)",
    "\n",
    "\n",
    "fun = theano.function([my_scalar], scalar_squared)",
    "\n",
    "grad = theano.function([my_scalar], derivative)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt",
    "\n",
    "%matplotlib inline",
    "\n",
    "\n",
    "\n",
    "x = np.linspace(-3, 3)",
    "\n",
    "x_squared = list(map(fun, x))",
    "\n",
    "x_squared_der = list(map(grad, x))",
    "\n",
    "\n",
    "plt.plot(x, x_squared, label=\"x^2\")",
    "\n",
    "plt.plot(x, x_squared_der, label=\"derivative\")",
    "\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Why that rocks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "my_vector = T.vector('float64')",
    "\n",
    "\n",
    "# Compute the gradient of the next weird function over my_scalar and my_vector",
    "\n",
    "# warning! Trying to understand the meaning of that function may result in permanent brain damage",
    "\n",
    "\n",
    "weird_psychotic_function = ((my_vector+my_scalar)**(1+T.var(my_vector)) + 1./T.arcsinh(my_scalar)).mean()/(my_scalar**2 + 1) + 0.01*T.sin(2*my_scalar**1.5)*(",
    "\n",
    "    T.sum(my_vector) * my_scalar**2)*T.exp((my_scalar-4)**2)/(1+T.exp((my_scalar-4)**2))*(1.-(T.exp(-(my_scalar-4)**2))/(1+T.exp(-(my_scalar-4)**2)))**2",
    "\n",
    "\n",
    "\n",
    "der_by_scalar, der_by_vector = <student.compute_grad_over_scalar_and_vector() >",
    "\n",
    "\n",
    "\n",
    "compute_weird_function = theano.function(",
    "\n",
    "    [my_scalar, my_vector], weird_psychotic_function)",
    "\n",
    "compute_der_by_scalar = theano.function([my_scalar, my_vector], der_by_scalar)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plotting your derivative",
    "\n",
    "vector_0 = [1, 2, 3]",
    "\n",
    "\n",
    "scalar_space = np.linspace(0, 7)",
    "\n",
    "\n",
    "y = [compute_weird_function(x, vector_0) for x in scalar_space]",
    "\n",
    "plt.plot(scalar_space, y, label='function')",
    "\n",
    "y_der_by_scalar = [compute_der_by_scalar(x, vector_0) for x in scalar_space]",
    "\n",
    "plt.plot(scalar_space, y_der_by_scalar, label='derivative')",
    "\n",
    "plt.grid()",
    "\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Almost done - Updates\n",
    "\n",
    "* updates are a way of changing shared variables at after function call.\n",
    "\n",
    "* technically it's a dictionary {shared_variable : a recipe for new value} which is has to be provided when function is compiled\n",
    "\n",
    "That's how it works:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Multiply shared vector by a number and save the product back into shared vector",
    "\n",
    "\n",
    "inputs = [input_scalar]",
    "\n",
    "outputs = [scalar_times_shared]  # return vector times scalar",
    "\n",
    "\n",
    "my_updates = {",
    "\n",
    "    # and write this same result bach into shared_vector_1",
    "\n",
    "    shared_vector_1: scalar_times_shared",
    "\n",
    "}",
    "\n",
    "\n",
    "compute_and_save = theano.function(inputs, outputs, updates=my_updates)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "shared_vector_1.set_value(np.arange(5))",
    "\n",
    "\n",
    "# initial shared_vector_1",
    "\n",
    "print(\"initial shared value:\", shared_vector_1.get_value())",
    "\n",
    "\n",
    "# evaluating the function (shared_vector_1 will be changed)",
    "\n",
    "print(\"compute_and_save(2) returns\", compute_and_save(2))",
    "\n",
    "\n",
    "# evaluate new shared_vector_1",
    "\n",
    "print(\"new shared value:\", shared_vector_1.get_value())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Logistic regression example (4 pts)\n",
    "\n",
    "Implement the regular logistic regression training algorithm\n",
    "\n",
    "Tips:\n",
    "* Weights fit in as a shared variable\n",
    "* X and y are potential inputs\n",
    "* Compile 2 functions:\n",
    " * train_function(X,y) - returns error and computes weights' new values __(through updates)__\n",
    " * predict_fun(X) - just computes probabilities (\"y\") given data\n",
    " \n",
    " \n",
    "We shall train on a two-class MNIST dataset\n",
    "* please note that target y are {0,1} and not {-1,1} as in some formulae"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.datasets import load_digits",
    "\n",
    "mnist = load_digits(2)",
    "\n",
    "\n",
    "X, y = mnist.data, mnist.target",
    "\n",
    "\n",
    "\n",
    "print(\"y [shape - %s]:\" % (str(y.shape)), y[:10])",
    "\n",
    "\n",
    "print(\"X [shape - %s]:\" % (str(X.shape)))",
    "\n",
    "print(X[:3])",
    "\n",
    "print(y[:10])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# inputs and shareds",
    "\n",
    "shared_weights = <student.code_me() >",
    "\n",
    "input_X = <student.code_me() >",
    "\n",
    "input_y = <student.code_me() >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "predicted_y = <predicted probabilities for input_X >",
    "\n",
    "\n",
    "\n",
    "loss = <logistic loss(scalar, mean over sample) >",
    "\n",
    "\n",
    "\n",
    "grad = <gradient of loss over model weights >",
    "\n",
    "\n",
    "\n",
    "updates = {",
    "\n",
    "    shared_weights: < new weights after gradient step >",
    "\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "train_function = <compile function that takes X and y, returns log loss and updates weights >",
    "\n",
    "\n",
    "predict_function = <compile function that takes X and computes probabilities of y >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.cross_validation import train_test_split",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.metrics import roc_auc_score",
    "\n",
    "\n",
    "for i in range(5):",
    "\n",
    "    loss_i = train_function(X_train, y_train)",
    "\n",
    "    print(\"loss at iter %i:%.4f\" % (i, loss_i))",
    "\n",
    "    print(\"train auc:\", roc_auc_score(y_train, predict_function(X_train)))",
    "\n",
    "    print(\"test auc:\", roc_auc_score(y_test, predict_function(X_test)))",
    "\n",
    "\n",
    "\n",
    "print(\"resulting weights:\")",
    "\n",
    "plt.imshow(shared_weights.get_value().reshape(8, -1))",
    "\n",
    "plt.colorbar()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# lasagne\n",
    "* lasagne is a library for neural network building and training\n",
    "* it's a low-level library with almost seamless integration with theano\n",
    "\n",
    "For a demo we shall solve the same digit recognition problem, but at a different scale\n",
    "* images are now 28x28\n",
    "* 10 different digits\n",
    "* 50k samples"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mnist import load_dataset",
    "\n",
    "X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()",
    "\n",
    "\n",
    "print X_train.shape, y_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import lasagne",
    "\n",
    "\n",
    "input_X = T.tensor4(\"X\")",
    "\n",
    "\n",
    "# input dimention (None means \"Arbitrary\" and only works at  the first axes [samples])",
    "\n",
    "input_shape = [None, 1, 28, 28]",
    "\n",
    "\n",
    "target_y = T.vector(\"target Y integer\", dtype='int32')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Defining network architecture"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Input layer (auxilary)",
    "\n",
    "input_layer = lasagne.layers.InputLayer(shape=input_shape, input_var=input_X)",
    "\n",
    "\n",
    "# fully connected layer, that takes input layer and applies 50 neurons to it.",
    "\n",
    "# nonlinearity here is sigmoid as in logistic regression",
    "\n",
    "# you can give a name to each layer (optional)",
    "\n",
    "dense_1 = lasagne.layers.DenseLayer(input_layer, num_units=50,",
    "\n",
    "                                    nonlinearity=lasagne.nonlinearities.sigmoid,",
    "\n",
    "                                    name=\"hidden_dense_layer\")",
    "\n",
    "\n",
    "# fully connected output layer that takes dense_1 as input and has 10 neurons (1 for each digit)",
    "\n",
    "# We use softmax nonlinearity to make probabilities add up to 1",
    "\n",
    "dense_output = lasagne.layers.DenseLayer(dense_1, num_units=10,",
    "\n",
    "                                         nonlinearity=lasagne.nonlinearities.softmax,",
    "\n",
    "                                         name='output')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# network prediction (theano-transformation)",
    "\n",
    "y_predicted = lasagne.layers.get_output(dense_output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# all network weights (shared variables)",
    "\n",
    "all_weights = lasagne.layers.get_all_params(dense_output)",
    "\n",
    "print(all_weights)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Than you could simply\n",
    "* define loss function manually\n",
    "* compute error gradient over all weights\n",
    "* define updates\n",
    "* But that's a whole lot of work and life's short\n",
    "  * not to mention life's too short to wait for SGD to converge\n",
    "\n",
    "Instead, we shall use Lasagne builtins"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Mean categorical crossentropy as a loss function - similar to logistic loss but for multiclass targets",
    "\n",
    "loss = lasagne.objectives.categorical_crossentropy(",
    "\n",
    "    y_predicted, target_y).mean()",
    "\n",
    "\n",
    "# prediction accuracy",
    "\n",
    "accuracy = lasagne.objectives.categorical_accuracy(",
    "\n",
    "    y_predicted, target_y).mean()",
    "\n",
    "\n",
    "# This function computes gradient AND composes weight updates just like you did earlier",
    "\n",
    "updates_sgd = lasagne.updates.sgd(loss, all_weights, learning_rate=0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# function that computes loss and updates weights",
    "\n",
    "train_fun = theano.function(",
    "\n",
    "    [input_X, target_y], [loss, accuracy], updates=updates_sgd)",
    "\n",
    "\n",
    "# function that just computes accuracy",
    "\n",
    "accuracy_fun = theano.function([input_X, target_y], accuracy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### That's all, now let's train it!\n",
    "* We got a lot of data, so it's recommended that you use SGD\n",
    "* So let's implement a function that splits the training sample into minibatches"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# An auxilary function that returns mini-batches for neural network training",
    "\n",
    "\n",
    "# Parameters",
    "\n",
    "# X - a tensor of images with shape (many, 1, 28, 28), e.g. X_train",
    "\n",
    "# y - a vector of answers for corresponding images e.g. Y_train",
    "\n",
    "# batch_size - a single number - the intended size of each batches",
    "\n",
    "\n",
    "# What do need to implement",
    "\n",
    "# 1) Shuffle data",
    "\n",
    "# - Gotta shuffle X and y the same way not to break the correspondence between X_i and y_i",
    "\n",
    "# 3) Split data into minibatches of batch_size",
    "\n",
    "# - If data size is not a multiple of batch_size, make one last batch smaller.",
    "\n",
    "# 4) return a list (or an iterator) of pairs",
    "\n",
    "# - (подгруппа картинок, ответы из y на эту подгруппу)",
    "\n",
    "\n",
    "\n",
    "def iterate_minibatches(X, y, batchsize):",
    "\n",
    "\n",
    "    <return an iterable of(X_batch, y_batch)  batches of images and answers for them >",
    "\n",
    "\n",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "#",
    "\n",
    "# You feel lost and wish you stayed home tonight?",
    "\n",
    "# Go search for a similar function at",
    "\n",
    "# https://github.com/Lasagne/Lasagne/blob/master/examples/mnist.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training loop"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import time",
    "\n",
    "\n",
    "num_epochs = 100  # amount of passes through the data",
    "\n",
    "\n",
    "batch_size = 50  # number of samples processed at each function call",
    "\n",
    "\n",
    "for epoch in range(num_epochs):",
    "\n",
    "    # In each epoch, we do a full pass over the training data:",
    "\n",
    "    train_err = 0",
    "\n",
    "    train_acc = 0",
    "\n",
    "    train_batches = 0",
    "\n",
    "    start_time = time.time()",
    "\n",
    "    for batch in iterate_minibatches(X_train, y_train, batch_size):",
    "\n",
    "        inputs, targets = batch",
    "\n",
    "        train_err_batch, train_acc_batch = train_fun(inputs, targets)",
    "\n",
    "        train_err += train_err_batch",
    "\n",
    "        train_acc += train_acc_batch",
    "\n",
    "        train_batches += 1",
    "\n",
    "\n",
    "    # And a full pass over the validation data:",
    "\n",
    "    val_acc = 0",
    "\n",
    "    val_batches = 0",
    "\n",
    "    for batch in iterate_minibatches(X_val, y_val, batch_size):",
    "\n",
    "        inputs, targets = batch",
    "\n",
    "        val_acc += accuracy_fun(inputs, targets)",
    "\n",
    "        val_batches += 1",
    "\n",
    "\n",
    "    # Then we print the results for this epoch:",
    "\n",
    "    print(\"Epoch {} of {} took {:.3f}s\".format(",
    "\n",
    "        epoch + 1, num_epochs, time.time() - start_time))",
    "\n",
    "\n",
    "    print(",
    "\n",
    "        \"  training loss (in-iteration):\\t\\t{:.6f}\".format(train_err / train_batches))",
    "\n",
    "    print(\"  train accuracy:\\t\\t{:.2f} %\".format(",
    "\n",
    "        train_acc / train_batches * 100))",
    "\n",
    "    print(\"  validation accuracy:\\t\\t{:.2f} %\".format(",
    "\n",
    "        val_acc / val_batches * 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_acc = 0",
    "\n",
    "test_batches = 0",
    "\n",
    "for batch in iterate_minibatches(X_test, y_test, 500):",
    "\n",
    "    inputs, targets = batch",
    "\n",
    "    acc = accuracy_fun(inputs, targets)",
    "\n",
    "    test_acc += acc",
    "\n",
    "    test_batches += 1",
    "\n",
    "print(\"Final results:\")",
    "\n",
    "print(\"  test accuracy:\\t\\t{:.2f} %\".format(",
    "\n",
    "    test_acc / test_batches * 100))",
    "\n",
    "\n",
    "if test_acc / test_batches * 100 > 99:",
    "\n",
    "    print(\"Achievement unlocked: 80lvl Warlock!\")",
    "\n",
    "else:",
    "\n",
    "    print(\"We need more magic!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A better network ( 4+ pts )\n",
    "\n",
    "\n",
    "* The quest is to create a network that gets at least 99% at test set\n",
    " * In case you tried several architectures and have a __detailed__ report - 97.5% \"is fine too\". \n",
    " * __+1 bonus point__ each 0.1% past 99%\n",
    " * More points for creative approach\n",
    " \n",
    "__ There is a mini-report at the end that you will have to fill in. We recommend to read it first and fill in while you are iterating. __\n",
    " \n",
    "\n",
    "## Tips on what can be done:\n",
    "\n",
    "\n",
    "\n",
    " * Network size\n",
    "   * MOAR neurons, \n",
    "   * MOAR layers, \n",
    "   * Convolutions are almost imperative\n",
    "   * Пх'нглуи мглв'нафх Ктулху Р'льех вгах'нагл фхтагн! \n",
    "   \n",
    "   \n",
    "   \n",
    " * Regularize to prevent overfitting\n",
    "   * Add some L2 weight norm to the loss function, theano will do the rest\n",
    "   * Can be done manually or via - http://lasagne.readthedocs.org/en/latest/modules/regularization.html\n",
    "   \n",
    "   \n",
    "   \n",
    " * Better optimization - rmsprop, nesterov_momentum, adadelta, adagrad and so on.\n",
    "   * Converge faster and sometimes reach better optima\n",
    "   * It might make sense to tweak learning rate, other learning parameters, batch size and number of epochs\n",
    "   \n",
    "   \n",
    "   \n",
    " * Dropout - to prevent overfitting\n",
    "   * `lasagne.layers.DropoutLayer(prev_layer, p=probability_to_zero_out)`\n",
    "   \n",
    "   \n",
    "   \n",
    " * Convolution layers\n",
    "   * `network = lasagne.layers.Conv2DLayer(prev_layer,`\n",
    "    `                       num_filters = n_neurons,`\n",
    "    `                        filter_size = (filter width, filter height),`\n",
    "    `                        nonlinearity = some_nonlinearity)`\n",
    "   * Warning! Training convolutional networks can take long without GPU.\n",
    "     * If you are CPU-only, we still recomment to try a simple convolutional architecture\n",
    "     * a perfect option is if you can set it up to run at nighttime and check it up at the morning.\n",
    " \n",
    " * Plenty other layers and architectures\n",
    "   * http://lasagne.readthedocs.org/en/latest/modules/layers.html\n",
    "   * batch normalization, pooling, etc\n",
    "   \n",
    "   \n",
    " * Nonlinearities in the hidden layers\n",
    "   * tanh, relu, leaky relu, etc\n",
    "   \n",
    " \n",
    " \n",
    "   \n",
    "There is a template for your solution below that you can opt to use or throw away and write it your way"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mnist import load_dataset",
    "\n",
    "X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()",
    "\n",
    "\n",
    "print X_train.shape, y_train.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import lasagne",
    "\n",
    "\n",
    "input_X = T.tensor4(\"X\")",
    "\n",
    "\n",
    "# input dimention (None means \"Arbitrary\" and only works at  the first axes [samples])",
    "\n",
    "input_shape = [None, 1, 28, 28]",
    "\n",
    "\n",
    "target_y = T.vector(\"target Y integer\", dtype='int32')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Input layer (auxilary)",
    "\n",
    "input_layer = lasagne.layers.InputLayer(shape=input_shape, input_var=input_X)",
    "\n",
    "\n",
    "<student.code_neural_network_architecture() >",
    "\n",
    "\n",
    "dense_output = <your network output >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Network predictions (theano-transformation)",
    "\n",
    "y_predicted = lasagne.layers.get_output(dense_output)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# All weights (shared-varaibles)",
    "\n",
    "# \"trainable\" flag means not to return auxilary params like batch mean (for batch normalization)",
    "\n",
    "all_weights = lasagne.layers.get_all_params(dense_output, trainable=True)",
    "\n",
    "print(all_weights)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# loss function",
    "\n",
    "loss = <loss function >",
    "\n",
    "\n",
    "# <optionally add regularization>",
    "\n",
    "\n",
    "accuracy = <mean accuracy score for evaluation >",
    "\n",
    "\n",
    "# weight updates",
    "\n",
    "updates = <try different update methods >"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# A function that accepts X and y, returns loss functions and performs weight updates",
    "\n",
    "train_fun = theano.function(",
    "\n",
    "    [input_X, target_y], [loss, accuracy], updates=updates_sgd)",
    "\n",
    "\n",
    "# A function that just computes accuracy given X and y",
    "\n",
    "accuracy_fun = theano.function([input_X, target_y], accuracy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# итерации обучения",
    "\n",
    "\n",
    "num_epochs = <how many times to iterate over the entire training set >",
    "\n",
    "\n",
    "batch_size = <how many samples are processed at a single function call >",
    "\n",
    "\n",
    "for epoch in range(num_epochs):",
    "\n",
    "    # In each epoch, we do a full pass over the training data:",
    "\n",
    "    train_err = 0",
    "\n",
    "    train_acc = 0",
    "\n",
    "    train_batches = 0",
    "\n",
    "    start_time = time.time()",
    "\n",
    "    for batch in iterate_minibatches(X_train, y_train, batch_size):",
    "\n",
    "        inputs, targets = batch",
    "\n",
    "        train_err_batch, train_acc_batch = train_fun(inputs, targets)",
    "\n",
    "        train_err += train_err_batch",
    "\n",
    "        train_acc += train_acc_batch",
    "\n",
    "        train_batches += 1",
    "\n",
    "\n",
    "    # And a full pass over the validation data:",
    "\n",
    "    val_acc = 0",
    "\n",
    "    val_batches = 0",
    "\n",
    "    for batch in iterate_minibatches(X_val, y_val, batch_size):",
    "\n",
    "        inputs, targets = batch",
    "\n",
    "        val_acc += accuracy_fun(inputs, targets)",
    "\n",
    "        val_batches += 1",
    "\n",
    "\n",
    "    # Then we print the results for this epoch:",
    "\n",
    "    print(\"Epoch {} of {} took {:.3f}s\".format(",
    "\n",
    "        epoch + 1, num_epochs, time.time() - start_time))",
    "\n",
    "\n",
    "    print(",
    "\n",
    "        \"  training loss (in-iteration):\\t\\t{:.6f}\".format(train_err / train_batches))",
    "\n",
    "    print(\"  train accuracy:\\t\\t{:.2f} %\".format(",
    "\n",
    "        train_acc / train_batches * 100))",
    "\n",
    "    print(\"  validation accuracy:\\t\\t{:.2f} %\".format(",
    "\n",
    "        val_acc / val_batches * 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "test_acc = 0",
    "\n",
    "test_batches = 0",
    "\n",
    "for batch in iterate_minibatches(X_test, y_test, 500):",
    "\n",
    "    inputs, targets = batch",
    "\n",
    "    acc = accuracy_fun(inputs, targets)",
    "\n",
    "    test_acc += acc",
    "\n",
    "    test_batches += 1",
    "\n",
    "print(\"Final results:\")",
    "\n",
    "print(\"  test accuracy:\\t\\t{:.2f} %\".format(",
    "\n",
    "    test_acc / test_batches * 100))",
    "\n",
    "\n",
    "if test_acc / test_batches * 100 > 99:",
    "\n",
    "    print(\"Achievement unlocked: 80lvl Warlock!\")",
    "\n",
    "else:",
    "\n",
    "    print(\"We need more magic!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Report\n",
    "\n",
    "All creative approaches are highly welcome, but at the very least it would be great to mention\n",
    "* the idea;\n",
    "* brief history of tweaks and improvements;\n",
    "* what is the final architecture and why?\n",
    "* what is the training method and, again, why?\n",
    "* Any regularizations and other techniques applied and their effects;\n",
    "\n",
    "\n",
    "There is no need to write strict mathematical proofs (unless you want to).\n",
    " * \"I tried this, this and this, and the second one turned out to be better. And i just didn't like the name of that one\" - OK, but can be better\n",
    " * \"I have analized these and these articles|sources|blog posts, tried that and that to adapt them to my problem and the conclusions are such and such\" - the ideal one\n",
    " * \"I took that code that demo without understanding it, but i'll never confess that and instead i'll make up some pseudoscientific explaination\" - __not_ok__"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Hi, my name is `___ ___`, and here's my story\n",
    "\n",
    "A long ago in a galaxy far far away, when it was still more than an hour before deadline, i got an idea:\n",
    "\n",
    "##### I gonna build a neural network, that\n",
    "* brief text on what was\n",
    "* the original idea\n",
    "* and why it was so\n",
    "\n",
    "How could i be so naive?!\n",
    "\n",
    "##### One day, with no signs of warning,\n",
    "This thing has finally converged and\n",
    "* Some explaination about what were the results,\n",
    "* what worked and what didn't\n",
    "* most importantly - what next steps were taken, if any\n",
    "* and what were their respective outcomes\n",
    "\n",
    "##### Finally, after __  iterations, __ mugs of [tea/coffee]\n",
    "* what was the final architecture\n",
    "* as well as training method and tricks\n",
    "\n",
    "That, having wasted ____ [minutes, hours or days] of my life training, got\n",
    "\n",
    "* accuracy on training: __\n",
    "* accuracy on validation: __\n",
    "* accuracy on test: __\n",
    "\n",
    "\n",
    "[an optional afterword and mortal curses on assignment authors]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "language_info": {
   "name": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
